Learning Rewards From Linguistic Feedback
نویسندگان
چکیده
We explore unconstrained natural language feedback as a learning signal for artificial agents. Humans use rich and varied to teach, yet most prior work on interactive from assumes particular form of input (e.g., commands). propose general framework which does not make this assumption, instead using aspect-based sentiment analysis decompose into over the features Markov decision process. then infer teacher's reward function by regressing features, an analogue inverse reinforcement learning. To evaluate our approach, we first collect corpus teaching behavior in cooperative task where both teacher learner are human. implement three learners: sentiment-based "literal" "pragmatic" models, inference network trained end-to-end predict rewards. re-run initial experiment, pairing human teachers with these learners. All models successfully learn feedback. The approaches performance model, while model nears performance. Our provides insight information structure naturalistic linguistic well methods leverage it
منابع مشابه
Beyond Rewards : Learning from Richer Supervision
Recently there has been some interest in the reinforcement learning community on learning from richer feedback from the environment rather than just a scalar reward signal. In this paper we look at the question of learning from sporadic instructions from a human. Instructions can take several forms, from complete specification of policies, to directing the agent’s search to specific parts of th...
متن کاملReinforcement Learning Without Rewards
Machine learning can be broadly defined as the study and design of algorithms that improve with experience. Reinforcement learning is a variety of machine learning that makes minimal assumptions about the information available for learning, and, in a sense, defines the problem of learning in the broadest possible terms. Reinforcement learning algorithms are usually applied to “interactive” prob...
متن کاملDiscretionary rewards as a feedback mechanism
Article history: Received 7 March 2006 Available online 19 March 2009 JEL classification: D82 J33 M50
متن کاملLearning Analytics: Readiness and Rewards
This position paper introduces the relatively new field of learning analytics, first by considering the relevant meanings of both “learning” and “analytics,” and then by looking at two main levels at which learning analytics can be or has been implemented in educational organizations. Although integrated turnkey systems or modules are not yet available for review, specific technologies for anal...
متن کاملValue and probability coding in a feedback-based learning task utilizing food rewards.
For the consequences of our actions to guide behavior, the brain must represent different types of outcome-related information. For example, an outcome can be construed as negative because an expected reward was not delivered or because an outcome of low value was delivered. Thus behavioral consequences can differ in terms of the information they provide about outcome probability and value. We ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i7.16749